Throughout, we will consider different geographic units, corresponding to various collections of counties, and other unified geographical areas corresponding to different parts of the landscape.
core_counties <- c(
"Fulton", "DeKalb", "Gwinnett", "Cobb", "Clayton"
)
arc_counties <- c(
"Fulton", "DeKalb", "Gwinnett", "Cobb", "Clayton", "Forsyth",
"Cherokee", "Douglas", "Fayette", "Henry", "Rockdale"
)
atlanta_msa_1 <- c(
'Fulton', 'Gwinnett',
'Cobb', 'DeKalb', 'Clayton',
'Cherokee', 'Forsyth',
'Henry', 'Paulding', 'Coweta',
'Douglas', 'Fayette', 'Carroll',
'Newton', 'Bartow', 'Walton',
'Rockdale'
)
atlanta_msa_2 <- c(
'Barrow', 'Spalding', 'Pickens',
'Haralson', 'Dawson', 'Butts',
'Meriwether', 'Morgan',
'Pike', 'Lamar', 'Jasper', 'Heard'
)
atlanta_msa <- c(atlanta_msa_1, atlanta_msa_2)Throughout, we will frequently use ACS data; for ACS 1 year estimates, we use data from 2009 to 2019 (we note that 2020 data has not been released as a result of the COVID pandemic adversely affecting data collection), and for ACS 5 year estimates, we use data from 2009 to 2020.
years_acs1 <- lst(
2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2018, 2019
)
years_acs5 <- lst(
2009, 2010, 2011, 2012, 2013, 2014,
2015, 2016, 2017, 2018, 2019, 2020
)Here, we will consider the I-285 ringroad as defining the “perimeter” of the “main” part of Atlanta, and will consider anything inside of this as being “inside the perimeter” (or ITP), and anything outside as “outside the perimeter” (or OTP). The shape file for the ringroad is available online.
highways <- sf::st_read("../data/georgia-highways.geojson")## Reading layer `Expressways_Georgia' from data source `/mnt/c/Users/adavi/Documents/comp-journal/ajc-satellite/data/georgia-highways.geojson' using driver `GeoJSON'
## Simple feature collection with 44 features and 10 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -85.55382 ymin: 30.62627 xmax: -81.09842 ymax: 34.98588
## Geodetic CRS: WGS 84
it_285 <- highways %>%
filter(str_detect(ROAD_NAME, "285")) %>%
select(ROAD_NAME, geometry)
it_285 <- it_285[c(1, 3), ]
itp <- st_union(st_convex_hull(it_285))As is visible from the plot below, the ringroad does not necessarily line up nicely with the county divisions within Georgia: the perimeter is highlighted in red, with the regular counties highlighted in black. However, the census tracts (in grey) do line up more nicely with the perimeter, albeit not exactly.
arc_counties_shp <- sf::st_read("../ga-counties/Counties_Georgia.shp") %>%
filter(NAME10 %in% arc_counties) %>%
select(NAME10, COUNTYFP10, geometry)## Reading layer `Counties_Georgia' from data source `/mnt/c/Users/adavi/Documents/comp-journal/ajc-satellite/ga-counties/Counties_Georgia.shp' using driver `ESRI Shapefile'
## Simple feature collection with 159 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -85.60517 ymin: 30.35576 xmax: -80.75143 ymax: 35.00067
## Geodetic CRS: WGS 84
arc_tracts_shp <- sf::st_read("../ga-tracts-2020/tl_2020_13_tract.shp") %>%
filter(COUNTYFP %in% arc_counties_shp$COUNTYFP10)## Reading layer `tl_2020_13_tract' from data source `/mnt/c/Users/adavi/Documents/comp-journal/ajc-satellite/ga-tracts-2020/tl_2020_13_tract.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2796 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -85.60516 ymin: 30.35576 xmax: -80.78296 ymax: 35.0013
## Geodetic CRS: NAD83
highway_arc <- st_intersection(highways, arc_counties_shp)
ggplot() +
geom_sf(data = arc_tracts_shp, colour = "grey", fill = NA) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = itp, colour = "red", fill = NA)Ultimately, we want to try and assign each census tract to being ITP or OTP. Firstly, to handle this, we count the number of tracts (where we use the 2020 Census Tracts) which intersect with the “perimeter” itself.
ics <- st_intersects(
st_transform(st_boundary(itp), crs = 4269),
arc_tracts_shp
)
print(paste(
"Number of census tracts intersecting the interstate:",
length(ics[[1]])
))## [1] "Number of census tracts intersecting the interstate: 64"
As we can see, a small but non-zero number fall inside of this category, and so we will keep track of:
arc_tracts_shp <- arc_tracts_shp %>%
select(NAME, COUNTYFP, TRACTCE, geometry) %>%
mutate(
tp = st_intersects(
st_transform(st_boundary(itp), crs = 4269),
geometry, sparse = FALSE
)[1, ],
itp = st_within(
geometry,
st_transform(itp, crs = 4269), sparse = FALSE
)[, 1],
otp = !tp & !itp
)
arc_tracts_shp %>%
st_drop_geometry() %>%
summarize(
tp = sum(tp), itp = sum(itp), otp = sum(otp)
)arc_tracts_shp2 <- arc_tracts_shp %>%
pivot_longer(
cols = c(tp, itp, otp),
names_to = "class",
values_to = "in_class"
) %>%
filter(in_class == TRUE) %>%
select(-in_class)
ggplot() +
geom_sf(
data = arc_tracts_shp2, colour = "grey",
mapping = aes(fill = class)
) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = itp, colour = "red", fill = NA) +
coord_sf(xlim = c(-84.6, -84), ylim = c(33.4, 34), expand = TRUE)ACS 1-year data from 2009 to 2019, showing the number of households (with error estimates) which lie in particular income bands across the ARC counties.
df_income_distributionTo answer this, we summarize over all the counties the number of households in each income bracket, along with the total number, and do so for each year. The margin of errors give 90% confidence intervals of each estimate; to get the margin of error for a sum of variables, we sum the square of the margin of error across all the variables, and then take the square root. For this analysis, we look at all the counties which belong to the Altanta Regional Commission.
df_income_dist_summary <- df_income_distribution %>%
select(year, county, variable, estimate, moe) %>%
group_by(year, variable) %>%
summarize(
estimate = sum(estimate),
moe = sqrt(sum(moe**2))
) %>%
ungroup() %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
filter(variable != "Total")
df_income_dist_summaryggplot(df_income_dist_summary, aes(x = variable, y = estimate)) +
facet_wrap(~ year) +
geom_col() +
geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe)) +
labs(
x = "Income band", y = "Number of households",
title = "Distribution of Household Income in the Atlanta area Between 2009 and 2019"
) + coord_flip()df_income_prop_summary <- df_income_distribution %>%
select(year, county, variable, estimate, moe) %>%
group_by(year, variable) %>%
summarize(
estimate = sum(estimate),
moe = sqrt(sum(moe**2))
) %>%
ungroup() %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
filter(variable != "Total") %>%
group_by(year) %>%
mutate(
moe = moe / sum(estimate),
estimate = estimate / sum(estimate),
)
df_income_prop_summaryggplot(df_income_prop_summary, aes(x = variable, y = estimate)) +
facet_wrap(~ year) +
geom_col() +
geom_errorbar(aes(ymin = estimate - moe, ymax = estimate + moe)) +
labs(
x = "Income band", y = "Proportion of households",
title = "Distribution of Household Income in the Atlanta area Between 2009 and 2019"
) + coord_flip()Looking at the number of households in each income band at 2009 and 2019, we can see that there has been a large increase in the number of households earning $100k or more, while the number of households earning less than $50k has mostly stayed constant.
df_income_dist_summary %>%
filter(year %in% c(2009, 2019)) %>%
ggplot(aes(x = variable, y = estimate, fill = year)) +
geom_col(position = "dodge") +
geom_errorbar(
aes(ymin = estimate - moe, ymax = estimate + moe),
position = "dodge"
) +
labs(
y = "Proportion of households", x = "Income band",
title = "Distribution of Household Income in the Atlanta area (2009 and 2019)"
) +
coord_flip() df_income_distribution %>%
select(year, county, variable, estimate, moe) %>%
filter(variable != "Total") %>%
group_by(year, county) %>%
mutate(estimate = estimate / sum(estimate)) %>%
ungroup() %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
ggplot(aes(x = year, y = estimate, fill = forcats::fct_rev(variable))) +
facet_wrap(~ county) +
geom_col() + coord_flip() +
labs(
x = "Year", y = "Proportion of Households", fill = "Income Band",
title = "Distribution of Household Income across Counties and Year"
) +
theme(legend.position = "bottom")We note that between the 2009 5-year ACS estimates and the 2020 5-year ACS estimates, the number of census tracts has changed, and therefore we will compute for each census tract (in each year period) whether it was ITP, TP or OTP.
df_income_tracts <- map_dfr(
lst(2009, 2020),
~ get_acs(
geography = "tract",
state = "GA",
variables = income_var_list,
year = .x,
county = arc_counties,
survey = "acs5",
geometry = FALSE
),
.id = "year") %>%
mutate(
tract = gsub("^(.*?),.*", "\\1", NAME),
county = gsub("^(.*?), (.*?),.*", "\\2", NAME)
) %>%
select(-NAME) %>%
relocate(tract, county, .after = GEOID)
df_tracts <- map_dfr(
lst(2009, 2020),
~ get_acs(
geography = "tract",
state = "GA",
variables = "B19001_001",
year = .x,
county = arc_counties,
survey = "acs5",
geometry = TRUE
),
.id = "year") %>%
select(year, GEOID, geometry) %>%
mutate(
tp = st_intersects(
st_transform(st_boundary(itp), crs = 4269),
geometry, sparse = FALSE
)[1, ],
itp = st_within(
geometry,
st_transform(itp, crs = 4269), sparse = FALSE
)[, 1],
otp = !tp & !itp
) %>%
pivot_longer(
cols = c(tp, itp, otp),
names_to = "class",
values_to = "in_class"
) %>%
filter(in_class == TRUE) %>%
select(-in_class)
df_tracts_peri <- df_tracts %>% st_drop_geometry()
df_income_tracts %>%
filter(variable != "Total") %>%
left_join(df_tracts_peri) %>%
select(year, class, variable, estimate, moe) %>%
group_by(year, class, variable) %>%
summarize(
estimate = sum(estimate),
moe = sqrt(sum(moe**2))
) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
group_by(year, class) %>%
mutate(
moe = moe / sum(estimate),
estimate = estimate / sum(estimate)
) %>%
ggplot(aes(x = variable, y = estimate, fill = year)) +
geom_col(position = "dodge") +
geom_errorbar(
aes(ymin = estimate - moe, ymax = estimate + moe),
position = "dodge"
) +
facet_grid(~ class) +
labs(x = "Income band", y = "Proportion of population") +
coord_flip()Examining the three different classes, it appears that the main changes are that there are an increasing number of households in very high income brackets both inside and outside the perimeter, and that the number of households on the perimeter which were in the low income brackets has decreased over the past ten years by a considerable amount. (If anything, this seems like the most suprising thing I’ve found here.)
df_income_tracts %>%
filter(variable != "Total") %>%
left_join(df_tracts_peri) %>%
select(year, class, variable, estimate, moe) %>%
group_by(year, class, variable) %>%
summarize(
estimate = sum(estimate),
moe = sqrt(sum(moe**2))
) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
ggplot(aes(x = variable, y = estimate, fill = year)) +
geom_col(position = "dodge") +
geom_errorbar(
aes(ymin = estimate - moe, ymax = estimate + moe),
position = "dodge"
) +
facet_grid(~ class, scales = "free") +
labs(x = "Income band", y = "Proportion of population") +
coord_flip()To compare the income distribution across tracts over time, for each census tract, we will look at the proportion of the population in the different income bands, and then compare these between 2010 and 2019. (These time periods were chosen so that there are no issues in switching between different census tracts.) To do so, we will compute the total variation distance (up to a scalar) between the two distributions; formally, if we have \(K\) different possible values, and we have two sets of proportions labelled \(p(i)\) and \(q(i)\) (where \(i\) can take on \(1\), \(2\), …, or \(K\)), then we compute the value \[ |p(1) - q(1)| + |p(2) - q(2)| + \cdots + |p(K) - q(K)|, \] and use this as a measure of how different \(p\) and \(q\) are. To try and observe what is happening in relation to some other geographical structures, the major highways beyond that of the interstate have also been added to the plots below. (Apologies for the color blind unfriendly palette.)
df_income_tracts <- map_dfr(
lst(2010, 2019),
~ get_acs(
geography = "tract",
state = "GA",
variables = income_var_list,
year = .x,
county = arc_counties,
survey = "acs5",
geometry = FALSE
),
.id = "year") %>%
mutate(
tract = gsub("^(.*?),.*", "\\1", NAME),
county = gsub("^(.*?), (.*?),.*", "\\2", NAME)
) %>%
select(-NAME) %>%
relocate(tract, county, .after = GEOID)
df_tracts <- map_dfr(
lst(2010),
~ get_acs(
geography = "tract",
state = "GA",
variables = "B19001_001",
year = .x,
county = arc_counties,
survey = "acs5",
geometry = TRUE
),
.id = "year") %>%
select(GEOID, geometry)
df_income_change_tracts <- df_income_tracts %>%
filter(variable != "Total") %>%
select(year, GEOID, variable, estimate) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
group_by(year, GEOID) %>%
mutate(estimate = estimate / sum(estimate)) %>%
pivot_wider(
names_from = year,
names_prefix = "estimate_",
values_from = estimate
) %>%
mutate(
diff = abs(estimate_2010 - estimate_2019)
) %>%
group_by(GEOID) %>%
summarize(income_dist_change = sum(diff)) %>%
mutate(
rank = cume_dist(income_dist_change),
rank_dis = ntile(income_dist_change, 10)
) %>%
left_join(df_tracts)
df_income_change_tracts %>%
select(-geometry) %>%
ggplot(aes(x = 0.5 * income_dist_change)) +
geom_histogram() +
labs(
x = "Total variation distance between income distributions in 2010 and 2019",
title = "Histogram of total variation distances between income distributions across census tracts"
)ggplot() +
geom_sf(
data = df_income_change_tracts,
aes(geometry = geometry, fill = 0.5 * income_dist_change),
colour = "grey"
) + scale_fill_viridis(option = "cividis") +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "green", fill = NA) +
theme(legend.position = "bottom") +
labs(fill = "Total variation of income distributions")ggplot() +
geom_sf(
data = df_income_change_tracts %>%
filter(income_dist_change > 0.7),
aes(geometry = geometry, fill = 0.5 * income_dist_change),
colour = "grey"
) + scale_fill_viridis(option = "cividis") +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "green", fill = NA) +
theme(legend.position = "bottom") +
labs(fill = "Total variation of income distributions")ggplot() +
geom_sf(
data = df_income_change_tracts,
aes(geometry = geometry, fill = rank_dis),
colour = "grey"
) + scale_fill_viridis(option = "cividis") +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "green", fill = NA) +
theme(legend.position = "bottom") +
labs(fill = "Binned ranking of total variation of income distributions across tracts")This seems pretty abstract, and so instead we’ll look at some “extreme groups”:
and see where the numbers of these households/proportions have been increasing and decreasing in number the largest.
df_rdincome_tracts <- df_income_tracts %>%
filter(variable != "Total") %>%
select(year, GEOID, variable, estimate) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
mutate(variable = fct_collapse(variable,
"Less than $50,000" = c(
"Less than $10,000", "$10,000 to $14,999",
"$15,000 to $19,999", "$20,000 to $24,999",
"$25,000 to $29,999", "$30,000 to $34,999",
"$35,000 to $39,999", "$40,000 to $44,999",
"$45,000 to $49,999"
),
"$50,000 to $74,999" = c("$50,000 to $59,999", "$60,000 to $74,999"),
"$75,000 to $99,999" = c("$75,000 to $99,999"),
"$100,000 to $149,999" = c(
"$100,000 to $124,999", "$125,000 to $149,999"
),
"$150,000 to $199,999" = c("$150,000 to $199,999"),
"$200,000 or more" = c("$200,000 or more")
)) %>%
group_by(year, GEOID, variable) %>%
summarize(estimate = sum(estimate)) %>%
left_join(df_tracts)
ggplot() +
geom_sf(
data = df_rdincome_tracts %>% filter(year == '2010'),
aes(geometry = geometry, fill = estimate),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "viridis", trans = 'sqrt') +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#f100e5", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Number of households",
title = "Number of households by income band and tract",
subtitle = "(Data using 5-year estimates from the 2006-2010 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))ggplot() +
geom_sf(
data = df_rdincome_tracts %>% filter(year == '2019'),
aes(geometry = geometry, fill = estimate),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "viridis", trans = 'log10') +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#f100e5", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Number of households",
title = "Number of households by income band and tract",
subtitle = "(Data using 5-year estimates from the 2015-2019 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))df_rdincome_change_tracts <- df_income_tracts %>%
filter(variable != "Total") %>%
select(year, GEOID, variable, estimate) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
mutate(variable = fct_collapse(variable,
"Less than $50,000" = c(
"Less than $10,000", "$10,000 to $14,999",
"$15,000 to $19,999", "$20,000 to $24,999",
"$25,000 to $29,999", "$30,000 to $34,999",
"$35,000 to $39,999", "$40,000 to $44,999",
"$45,000 to $49,999"
),
"$50,000 to $74,999" = c("$50,000 to $59,999", "$60,000 to $74,999"),
"$75,000 to $99,999" = c("$75,000 to $99,999"),
"$100,000 to $149,999" = c(
"$100,000 to $124,999", "$125,000 to $149,999"
),
"$150,000 to $199,999" = c("$150,000 to $199,999"),
"$200,000 or more" = c("$200,000 or more")
)) %>%
group_by(year, GEOID, variable) %>%
summarize(estimate = sum(estimate)) %>%
pivot_wider(
names_from = "year",
names_prefix = "estimate_",
values_from = "estimate"
) %>%
mutate(change = estimate_2019 - estimate_2010) %>%
left_join(df_tracts)
df_rdincome_change_tracts %>%
ggplot(aes(x = change)) + geom_histogram() +
facet_wrap(~ variable)ggplot() +
geom_sf(
data = df_rdincome_change_tracts,
aes(geometry = geometry, fill = change),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "turbo") +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#009b00", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Net change in households",
title = "Change in number of households by income band and tract",
subtitle = "(Data using 5-year estimates from the 2006-2010 ACS and 2015-2019 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))df_rdincome_prop_tracts <- df_income_tracts %>%
filter(variable != "Total") %>%
select(year, GEOID, variable, estimate) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
mutate(variable = fct_collapse(variable,
"Less than $50,000" = c(
"Less than $10,000", "$10,000 to $14,999",
"$15,000 to $19,999", "$20,000 to $24,999",
"$25,000 to $29,999", "$30,000 to $34,999",
"$35,000 to $39,999", "$40,000 to $44,999",
"$45,000 to $49,999"
),
"$50,000 to $74,999" = c("$50,000 to $59,999", "$60,000 to $74,999"),
"$75,000 to $99,999" = c("$75,000 to $99,999"),
"$100,000 to $149,999" = c(
"$100,000 to $124,999", "$125,000 to $149,999"
),
"$150,000 to $199,999" = c("$150,000 to $199,999"),
"$200,000 or more" = c("$200,000 or more")
)) %>%
group_by(year, GEOID, variable) %>%
summarize(estimate = sum(estimate)) %>%
group_by(year, GEOID) %>%
mutate(estimate = estimate / sum(estimate)) %>%
left_join(df_tracts)
ggplot() +
geom_sf(
data = df_rdincome_prop_tracts %>% filter(year == '2010'),
aes(geometry = geometry, fill = estimate),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "viridis", trans = 'sqrt') +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#f100e5", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Proportion of households",
title = "Proportion of households by income band in census tracts",
subtitle = "(Data using 5-year estimates from the 2006-2010 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))ggplot() +
geom_sf(
data = df_rdincome_prop_tracts %>% filter(year == '2019'),
aes(geometry = geometry, fill = estimate),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "viridis", trans = 'log10') +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#f100e5", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Proportion of households",
title = "Proportion of households by income band in census tracts",
subtitle = "(Data using 5-year estimates from the 2015-2019 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))df_rdincome_prop_change_tracts <- df_income_tracts %>%
filter(variable != "Total") %>%
select(year, GEOID, variable, estimate) %>%
mutate(
variable = factor(variable, ordered = TRUE, levels = income_var_labels)
) %>%
mutate(variable = fct_collapse(variable,
"Less than $50,000" = c(
"Less than $10,000", "$10,000 to $14,999",
"$15,000 to $19,999", "$20,000 to $24,999",
"$25,000 to $29,999", "$30,000 to $34,999",
"$35,000 to $39,999", "$40,000 to $44,999",
"$45,000 to $49,999"
),
"$50,000 to $74,999" = c("$50,000 to $59,999", "$60,000 to $74,999"),
"$75,000 to $99,999" = c("$75,000 to $99,999"),
"$100,000 to $149,999" = c(
"$100,000 to $124,999", "$125,000 to $149,999"
),
"$150,000 to $199,999" = c("$150,000 to $199,999"),
"$200,000 or more" = c("$200,000 or more")
)) %>%
group_by(year, GEOID, variable) %>%
summarize(estimate = sum(estimate)) %>%
group_by(year, GEOID) %>%
mutate(estimate = estimate / sum(estimate)) %>%
pivot_wider(
names_from = "year",
names_prefix = "estimate_",
values_from = "estimate"
) %>%
mutate(change = estimate_2019 - estimate_2010) %>%
left_join(df_tracts)
df_rdincome_prop_change_tracts %>%
ggplot(aes(x = change)) + geom_histogram() +
facet_wrap(~ variable)ggplot() +
geom_sf(
data = df_rdincome_prop_change_tracts,
aes(geometry = geometry, fill = change),
colour = "grey", lwd = 0.25
) + scale_fill_viridis(option = "turbo") +
facet_wrap(~ variable) +
geom_sf(data = arc_counties_shp, colour = "black", fill = NA) +
geom_sf(data = highway_arc, colour = "#009b00", fill = NA) +
theme(legend.position = "bottom") +
labs(
fill = "Change in proportion of households",
title = "Change in number of households by income band and tract",
subtitle = "(Data using 5-year estimates from the 2006-2010 ACS and 2015-2019 ACS)"
) +
theme(legend.key.width = unit(2, 'cm'))